-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Fix LOOKUP JOIN with limit #120411
ESQL: Fix LOOKUP JOIN with limit #120411
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to add more optimizer tests, but the production code changes are complete, so I'm undrafting already.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
Outdated
Show resolved
Hide resolved
I've started playing with the code. Found something, putting it here, sorry for not digging through it due to time constraints. I've added some more rows to
This one results in
|
Nice find @astefan . I can reproduce this, also with However, the problem is already present on main (also for I'll create a separate issue for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Limit.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Limit.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Limit.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alex, LGTM!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Limit.java
Outdated
Show resolved
Hide resolved
Use a helper function for all the similar as(...) calls for Limit.class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Left two highly optional notes.
...main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineLimits.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineLimits.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROW language_code = 1 | ||
| MV_EXPAND language_code | ||
| LIMIT 1 | ||
| SORT language_code | ||
| LIMIT 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case used to not correctly take into account the LIMIT 1
at the beginning of the plan because it was absorbed into the MV_EXPAND
. Now that the LIMIT 1
is duplicated rather than pushed down + absorbed, the downstream LIMIT 3
realizes that its actually a LIMIT 1
due to the upstream LIMIT 1
.
Thanks a lot for the reviews everyone! |
💚 Backport successful
|
For queries like ... | LOOKUP JOIN lookup_index ON key | LIMIT 10 the limit cannot be simply pushed past the join - but it can be duplicated past the join. In such cases, leave an explicit Limit plan node downstream from the Join (in addition to pushing down the limit), but mark it in a way that prevents being duplicated multiple times (which would cause infinite loops). Align the logic for MV_EXPAND, which used to, instead, internalize a limit into the MvExpand node.
* ESQL: Fix LOOKUP JOIN with limit (#120411) For queries like ... | LOOKUP JOIN lookup_index ON key | LIMIT 10 the limit cannot be simply pushed past the join - but it can be duplicated past the join. In such cases, leave an explicit Limit plan node downstream from the Join (in addition to pushing down the limit), but mark it in a way that prevents being duplicated multiple times (which would cause infinite loops). Align the logic for MV_EXPAND, which used to, instead, internalize a limit into the MvExpand node. * Fix compilation
Relates #118781.
This is like #115624, in spirit, but for
LOOKUP JOIN
.For queries like
the limit cannot be simply pushed past the join - but it can be duplicated past the join. The same currently happens with
MV_EXPAND
.This PR solves this by leaving an explicit
Limit
plan node downstream from theJoin
(in addition to pushing down the limit), but marks it in a way that prevents being duplicated multiple times (which would cause infinite loops).In contrast to the approach used for
MV_EXPAND
, the downstream limit is not internalized into theJoin
node, but instead we mark it with a boolean attribute that marks them as un-pushable pastMvExpand
andJoin
s. The fix forMV_EXPAND
is updated and aligned with the approach forLOOKUP JOIN
.This approach was discussed with @luigidellaquila. I believe it is preferable rather than having a
limit
attribute onMvExpand
andJoin
(+ subclasses), because it means that the logic of pushing down limits remains local to theLimit
class, and the mapper remains oblivious to all this, too. Also, any optimization rules (current and future ones) that do anything with limits continue to work without additional special casing forMvExpand
andJoin
. (PushDownAndCombineLimits
is already affected by this in an edge case.)